Finding Diagnostic Biomarkers in Proteomic Spectra
نویسندگان
چکیده
In seeking to find diagnostic biomarkers in proteomic spectra, two significant problems arise. First, not only is there noise in the measured intensity at each m/z value, but there is also noise in the measured m/z value itself. Second, the potential for overfitting is severe: it is easy to find features in the spectra that accurately discriminate disease states but have no biological meaning. We address these problems by developing and testing a series of steps for pre-processing proteomic spectra and extracting putatively meaningful features before presentation to feature selection and classification algorithms. These steps include an HMM-based latent spectrum extraction algorithm for fusing the information from multiple replicate spectra obtained from a single tissue sample, a simple algorithm for baseline correction based on a segmented convex hull, a peak identification and quantification algorithm, and a peak registration algorithm to align peaks from multiple tissue samples into common peak registers. We apply these steps to MALDI spectral data collected from normal and tumor lung tissue samples, and then compare the performance of feature selection with FDR followed by classification with an SVM, versus joint feature selection and classification with Bayesian sparse multinomial logistic regression (SMLR). The SMLR approach outperformed FDR+SVM, but both were effective in achieving good diagnostic accuracy with a small number of features. Some of the selected features have previously been investigated as clinical markers for lung cancer diagnosis; some of the remaining features are excellent candidates for further research.
منابع مشابه
Protein mass spectra data analysis for clinical biomarker discovery: a global review
The identification of new diagnostic or prognostic biomarkers is one of the main aims of clinical cancer research. In recent years there has been a growing interest in using high throughput technologies for the detection of such biomarkers. In particular, mass spectrometry appears as an exciting tool with great potential. However, to extract any benefit from the massive potential of clinical pr...
متن کاملA New Hybrid Feature Subset Selection Algorithm for the Analysis of Ovarian Cancer Data Using Laser Mass Spectrum
Introduction: Amajor problem in the treatment of cancer is the lack of an appropriate method for the early diagnosis of the disease. The chemical reaction within an organ may be reflected in the form of proteomic patterns in the serum, sputum, or urine. Laser mass spectrometry is a valuable tool for extracting the proteomic patterns from biological samples. A major challenge in extracting such ...
متن کاملMALDI-TOF MS combined with magnetic beads for detecting serum protein biomarkers and establishment of boosting decision tree model for diagnosis of systemic lupus erythematosus.
OBJECTIVES To discover novel potential biomarkers and establish a diagnostic pattern for SLE by using proteomic technology. METHODS Serum proteomic spectra were generated by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) combined with weak cationic exchange magnetic beads. A training set of spectra, derived from analysing sera from 32 patients with...
متن کاملApplication of adjusted-receiver operating characteristic curve analysis in combination of biomarkers for early detection of gestational diabetes mellitus
Introduction: In medical diagnostic field, evaluation of diagnostic accuracy of biomarkers or tests has always been a matter of concern. In some situations, one biomarker alone may not be sufficiently sensitive and specific for prediction of a disease. However, combining multiple biomarkers may lead to better diagnostic. The aim of this study was to assess the performance of combination of bio...
متن کاملClinical Significance of Salivary Biomarkers in Oral Squamous Cell Carcinoma: A Review
Background and Aim: Oral squamous cell carcinoma (OSCC) accounts for approximately 3% of all cancers worldwide, and if diagnosed early, it has a five-year survival rate of around 85%; however, a late diagnosis may decrease the survival rate to 50%. Aberrant expression of several genes is associated with the hallmarks of OSCC including uncontrolled cell proliferation, poor differentiation, invas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
دوره شماره
صفحات -
تاریخ انتشار 2006